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ABSTRACT 

Integrating viral vectors are efficient gene transfer 
tools, but their integration patterns have been 
associated with genotoxicity and oncogenicity. The 
recent development of highly specific designer nu- 
cleases has enabled target DNA modification and 
site-specific gene insertion at desired genomic 
loci. However, a lack of consensus exists regarding 
a perfect genomic safe harbour (GSH) that would 
allow transgenes to be stably and reliably expressed 
without adversely affecting endogenous gene 
structure and function. Ribosomal DNA (rDNA) has 
many advantages as a GSH, but efficient means to 
target integration to this locus are currently lacking. 
We tested whether lentivirus vector integration can 
be directed to rDNA by using fusion proteins con- 
sisting of the Human Immunodeficiency Virus 1 
(HIV-1) integrase (IN) and the homing endonuclease 
l-Ppol, which has natural cleavage sites in the rDNA. 
A point mutation (N119A) was introduced into l-Ppol 
to abolish unwanted DNA cleavage by the endo- 
nuclease. The vector-incorporated IN-l-PpolNiigA 
fusion protein targeted integration into rDNA signifi- 
cantly more than unmodified lentivirus vectors, with 
an efficiency of 2.7%. Our findings show that 
IN-fusion proteins can be used to modify the inte- 
gration pattern of lentivirus vectors, and to package 
site-specific DNA-recognizing proteins into vectors 
to obtain safer transgene integration. 



INTRODUCTION 

At present, the most efficient methods available for 
site-directed gene addition into human cells are based on 
DNA double-strand break (DSB)-enhanced homologous 
recombination (HR) (1). The site-specific cleavage of 
genomic DNA is catalysed using zinc finger nucleases 
(ZFNs), meganucleases or transcription activator-like 
effector nucleases (TALENs) (1-3). In the presence of a 
suitably designed homology-containing donor DNA 
molecule, insertion of exogenous sequences can occur at 
the cleaved site through homology-directed repair (HDR). 
Cellular expression of the nuclease protein is often 
achieved with DNA transfection methods, which can be 
difficult to translate into whole organisms [reviewed in 
(4)]. Integration-defective lentivirus vectors (IDLVs) have 
provided another means to enhance the delivery of both 
nuclease expression cassettes and the donor construct into 
cells (5). IDLVs can be used for both in vitro and in vivo 
transductions, but as a tool for delivering the 
recombination reaction components, they too suffer from 
limitations. First, the inability to control expression of the 
nuclease from the unintegrated vector is a drawback, and 
may lead to either over-expression-related cytotoxicity or 
inadequate enzyme levels in cells. Second, transduction of 
a target cell with the two to three required IDLVs simul- 
taneously may be challenging. Moreover, any cDNA 
imported into the nucleus may become illegitimately 
integrated into the genome, possibly allowing constant ex- 
pression of the imported genes. In the case of nucleases, 
this could predispose cells to genotoxicity and chromo- 
somal instabihty. 
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A protein transduction method was recently applied for 
the cellular delivery of a meganuclease and its recombin- 
ation substrate (6). The method rehes on the expression of 
a HIV-1 Vpr fusion protein in vector-producing cells to 
obtain inclusion into virions (7). Although this trans- 
packaging method is efficient with regard to foreign 
protein incorporation, it predisposes transduced cells to 
undesired side effects of Vpr, which include induction of 
cell cycle arrest and apoptosis [for a review, see (8)]. 
Moreover, the Vpr /ran.^-packaging approach requires in- 
duction of an extra plasmid into lentivirus vector produc- 
tion, as this dispensable gene has been deleted from the 
latest lentivirus vector (LVV) generations (9). 

We have previously demonstrated that HIV- 1 IN fusion 
proteins can be used as a cM-packaging method to deliver 
proteins of interest into transduced cells' nuclei while re- 
taining some level of integration activity (10). IN-fusion 
proteins have been created before with the aim of directing 
transgene integration into predetermined sites (11-14). 
Despite their ability to target integration in in vitro 
reactions, IN-fusion proteins functioned only at a 
modest efficiency in cultured cells (15). The efforts to 
affect lentiviral integration patterns were redirected to 
modifying the DNA-speciflcity of the lens epithelium- 
derived growth factor(LEDGF/p75), after identifying its 
role in tethering IN to the chromatin (16-19). 

We have generated new IN-fusion proteins with the aim 
of testing their applicability for further modifications of 
the protein content and the integration characteristics of 
third generation LVVs. I-Ppol is a dimeric 18-20kDa 
homing endonuclease protein of the slime mold 
Physarum polycephaliim, which has a natural 1 5-bp recog- 
nition site present in the highly conserved 28S ribosome 
RNA (rRNA) genes of eukaryotes (20,21). Each diploid 
human cell has about 600 copies of the rRNA genes in five 
clusters localized to the short arms of the acrocentric 
chromosomes 13, 14, 15, 21 and 22 (22). The tandemly 
repeated genes, collectively called the ribosomal DNA 
(rDNA), become transcribed in the nucleoh that form at 
the end of mitosis around rDNA (23). Owing to the wealth 
of rRNA genes and the presence of spacers between the 
gene repeats that likely confine natural insulator functions 
(24), the rDNA is an appealing safe harbour for transgene 
integration. We fused I-PpoI with HIV-1 IN to generate a 
fusion protein that would concomitantly answer two of 
our study questions: Can the cM-protein packaging strat- 
egy be used for the cellular delivery of a site-specific mega- 
nuclease, and can IN-fusion proteins promote targeted 
integration into a good GSH candidate, the rDNA. 

MATERIALS AND METHODS 

Plasmids 

IN-fusion constructs were cloned as described (10). The 
cDNA for I-PpoI was PCR amplified from the plasmid 
pCNPpo6 (a kind gift from Dr. Raymond J. Monnat Jr), 
using the primers I-PpoI Forw (5'-ATTCACCACTAGTGC 
TCCAAAAAAAAAGCGC-3') and I-PpoI Rev (5'-TATGG 
CCTCTCAGGCCATTATTATACCACAAAGTGACTGC 
C-3'). The N119A-mutated I-PpoI (25) was created with 
QuikChange® II XL Site-Directed Mutagenesis Kit 



(Stratagene) using the primers N119A Forw (5'-GGGAGT 
CACTAGACGACGCCAAAGGCAGAAACTGGT 
GCC-3') and N119A Rev (5'-GGCACCAGTTTCTGCCTT 
TGGCGTCGTCTAGTGACTCCC-3'). Expression cassette 
of the His-tagged IN-I-Ppol in the plasmid pBVboostFG 
were created for recombinant protein production in insect 
cells using the GATEWAY'^^ Cloning Technology (Gibco- 
BRL®, Life Technologies) (26). A double-stranded oligo- 
nucleotide containing the I-PpoI recognition sequence CT 
CTCTTAAGGTAGC was inserted into the EcoRV site of 
pBluescriptll (Stratagene) to prepare a plasmid containing a 
single cleavage site for I-Ppol. All ohgonucleotides used in 
cloning were purchased from Oligomer Oy (Helsinki, 
Finland). 

Recombinant IN-fusion protein production, purification 
and in vitro testing 

Baculovirus and protein production in insect cells were 
carried out as described (26,27). The recombinant 
His-tagged IN-I-PpoI and wt IN proteins were purified 
from infected insect cells using the BD TALON™ 
Metal Affinity Resin (BD Biosciences). The elution frac- 
tions containing the largest amount of recombinant 
protein were identified with western blot and used for 
in vitro DNA cleavage testing. Digestion mixtures were 
set up using 300 ng of pBluescriptll containing the 
I-PpoI site and either control I-PpoI (Promega) or the 
purified recombinant proteins wt IN or IN-I-PpoI. 
Digestions were carried out at 37°C for 2h, after which 
the Seal buffer and enzyme (Fermentas) were added to 
compose a double digestion. Digestions were verified by 
agarose gel electrophoresis. 

Vector and virus-like particle production 

Vesicular stomatitis virus G-glycoprotein (VSV-G) pseu- 
dotyped third-generation HIV-1 -based LVV stocks con- 
taining the IN-fusion proteins were prepared and titred 
as described (10). The core packaging plasmids used 
were pMDLg/pRRE, pMDLg/pRRE-IND64v, pMDLg/ 
pRRE-IN-I-PpoI or pMDLg/pRRE-IN-I-PpolNii9A 
(Figure 1). Vectors containing mixed IN-molecule 
multimers were produced using two different packaging 
plasmids in equimolar amounts. Virus-hke particles 
(VLPs) were produced with the same protocol but 
without a transfer construct. 

Cells, transduction and cytotoxicity assay 

Human embryonic kidney HEK 293 (ATCC: CRL- 
1573™), HeLa cells (ATCC: CCL-2™) and MRC-5 cells 
(ATCC: CCL-171™) were cultured in Dulbecco's modified 
Eagle's medium (DM EM; Sigma) supplemented with 1% 
Penicillin-Streptomycin (Sigma) and 10% Fetal Bovine 
Serum (FBS; Hyclone) at 37°C in a 5% COj-containing 
humidified atmosphere. The culture medium for MRC-5 
additionally contained 1% Non-Essential Amino Acid 
Solution (Sigma) and 1% Sodium Pyruvate solution 
(Sigma). Cells were transduced with LVVs diluted into 
prewarmed culture media. The cytotoxicity tests were done 
on transduced HeLa and MRC-5 cells using the 
CellTiter-Glo Luminescent Cell Viability Assay (Promega). 
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Figure 1. Production of lentivirus vectors and virus-like particles. (A) Lentivirus vector (LW) and virus-like particle (VLP) production plasmids. 
The packaging plasmids were used either alone or mixed in equimolar amounts to generate LVVs and VLPs containing a single type of IN-molecule 
or mixed IN-multimers. respectively. (B) The IN-molecule content of LVVs was detected by an immunoblot using antiserum to HIV-1 IN. The 
vector-contained IN molecules on each lane are hsted on the right. Expected molecular weights for IN/INo64v: 32kDa; IN-I-PpoI(Nii9A) 51kDa. 
PRO, protease; RT, reverse transcriptase; RRE, Rev-responsive element; pA, polyadenylation signal; CMV, human cytomegalovirus immediate-early 
enhancer/promoter; cPPT, central polypurine tract; hPGK, human phosphoglycerate kinase promoter; GFP, green fluorescent protein; WPRE, 
Woodchuck hepatitis virus post-transcriptional regulatory element; SIN, self-inactivated LTR; LTR, long terminal repeat; RSV, Rous Sarcoma 
Virus promoter; VSV-G, Vesicular stomatitis virus G glycoprotein; IDLV, integration deficient lentivirus vector; Wt, wild type. 



The day before transduction, 6000 HeLa cells/well and 
10000 MRC-5 cells/well were seeded onto 96-well 
microplates (B&W lsoplate-96 TC, Perkin Elmer). Cells 
were transduced with LVVs using 2 and lOng of p24 per 
well. 24, 48 and 72 h after transduction, each plate was 
assayed by adding the CellTiter-Glo Reagent and reading 
the luminescence. 

Western blot 

Recombinant proteins and correct packaging of the different 
IN proteins into LVVs were verified by western blot using 
antisera to HIV-1 IN, amino acids 23-34 (Cat. No. #757) 
obtained though NIH AIDS Research & Reference Reagent 
Program, and the secondary antibody Goat Anti-Rabbit 
IgG (H+L)-AP Conjugate (Bio-Rad Laboratories). 
Lentiviral vector preparations were lysed in Laemmli 
buffer and denatured at 95° C for 5min before separation 
on 10-12% sodium dodecyl sulphate-polyacrylamide gel 
electrophoresis (SDS-PAGE) gels. Proteins were transferred 
onto nitrocellulose membranes (0.2 |im, Trans-Blot Transfer 
Medium, Bio-Rad) and probed with antibodies. 

Immunofluorescence staining and scanning 
confocal microscopy 

MRC-5 cells grown on Poly-L-lysine (Sigma)-treated 
coverslips or Lab-Tek^'^ II Chambered Coverglasses 



were transduced with LVVs and VLPs using the same 
vector amounts as in the cytotoxicity assay, or treated 
with H2O2 (Sigma). After 4 to 6 h of transduction, ceUs 
were fixed and processed by immunocytochemistry using 
the directly labelled primary antibody Mouse IgG2b k 
Alexa Fluor 647 Anti-H2A.X-Phosphorylated (Serl39) 
Antibody (BioLegend) and the primary antibody rabbit 
polyclonal to Fibrillarin (ab5821; Abeam) with the sec- 
ondary antibody AF546 goat-anti-rabbit (Invitrogen). 
Nuclei were highlighted by mounting the samples with 
ProLong Gold antifade reagent with DAPI (Invitrogen, 
Carlsbad, CA, USA). Confocal microscopy images were 
acquired at room temperature with a Zeiss LSM 700 
confocal microscope operated with Zeiss Zen software 
(Carl Zeiss Microimaging, Jena, Germany) and 
combined using Adobe photoshop elements 5.0. 

Flow cytometry 

To measure GFP expression kinetics after LVV transduc- 
tion, HeLa cells were transduced with varying amounts of 
different vectors to obtain similar initial fluorescence 
levels. The day before transduction, HeLa cells were 
plated at 1 x 10^ cell/well onto six-well plates. Single-cell 
suspension samples were taken from the wells between 
days 1-28 post transduction, after which cells were 
replated. Samples were analysed with the BD FACS 
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Canto II and FACS Diva software (BD Biosciences). The 
relative integration efficiency was estimated as the percent- 
age of GFP positive cells from the day 2 peak value, which 
can originate from transient expression. 

Extracting genomic integration sites (IS) 

MRC-5 cells were transduced with LVVs at varying 
multiplicities of infection (MOI). Cells were cultivated 
for 7-14 days, pelleted and stored at — 70°C until used. 
LM-PCR was carried out as described (28) with modifica- 
tions. Briefly, 2-2.5 |ig of genomic DNA from transduced 
MRC-5 cells was digested using Avrll, Spel and Nhel, 
purified and ligated to linkers. The ligation mixture was 
heat-inactivated, and diluted to a volume of 80-100 (il. 
Phire™ or Phusion® hot start DNA polymerases 
(Finzymes Oy, Espoo, Finland) were used for LM-PCR 
with the following cycling: 1 x 98°C 30 s; 7 x 98°C 5 s, 
72°C lmin20s; 37 x 98°C 5 s, 66°C lmin20s; 1 x 72°C 
4min; 4°C. The primary LM-PCR products were diluted 
1:50, and used for nested PCR with the two-step PCR 
conditions: 1 x 98°C 30 s; 37 x 98°C 5 s, 72°C lmin20s; 
1 X 72°C 4min. The secondary barcoded LM-PCR 
products were purified using the ChargeSwitch® PCR 
Clean-Up Kit (Invitrogen), pooled and subjected to 
next-generation sequencing (454 Life Sciences GS FLX 
Titanium pyrosequencing platform, Beckman Coulter 
Genomics MA, United States). 

Bioinformatics methods 

Paired-end pyrosequencing reads were first decoded using 
exact match to DNA barcodes included in the second 
round of PCR. The resulting collection of sequences was 
aligned against three different target sequences using 
BEAT [BLAST-hke ahgnment tool, (29)] with >95% 
match score: (i) the ETR-specific ASBl primer of the 
second PCR, (ii) the linker-specific ASBl 6 primer of the 
second PCR and (iii) 100 bp viral ETR sequence. For a 
read to be considered as a vahd integration event, it was 
required to match all of the following filtering criteria: (i) 
must have a vahd alignment to both primers starting 
within the first 5 bases, (ii) the alignment against viral 
ETR should contain the last 22 bps directly next to the 
genomic DNA junction and (iii) the summed span of 
alignment against primers and ETR sequence should be 
<95% of the total read size. Reads starting with ASB16 
primers were reverse complemented to correct the genomic 
orientation. The curated reads were then processed as 
described (30). Additionally, to control for contamination 
or false positive decoding resulting from sequencing 
errors, each IS was checked for presence in more than 
one sample. The sample hosting the IS with higher 
sequence abundance was given priority over other 
samples sharing the same IS. In cases of ties, the IS was 
removed altogether. The short arms of the acrocentric 
chromosomes that contain the rDNA are not included 
in the human genome assembly NCB136/Hgl8. 
Integration sites in the rDNA were therefore analysed 
by BLAT-aligning reads to the human genome assembly 
GRCh37/Hgl9 and counting unique ISs in the unplaced 
supercontig ChrUn_gl000220 (GenBank: GL000220.1). 



The unplaced contig was analysed to contain one full 
and one shghtly shortened rRNA gene repeat by 
BLAT-ahgning different rRNA gene features from the 
Human ribosomal DNA complete repeating unit 
(GenBank: U 13369.1) to GRCh37/Hgl9. Integration 
sites for sequences that matched equally weU to either of 
the rRNA genes on the contig were placed on the first 
(full) rRNA gene repeat. ChrUn_gl000220 was given 
priority in cases where an IS could match equally well to 
the unplaced contig in Hgl9 and to rRNA gene fragments 
scattered in the non-acrocentric chromosomes in Hgl8. IS 
in the rDNA were visualized by adding IS information- 
containing custom tracks (Supplementary Methods) to the 
UCSC Genome Browser on Human Feb. 2009 Assembly 
(GRCh37/hgl9) (31). The total number of IS for different 
data sets in NCBI36/Hgl8 was corrected with the 
ChrUn_gl000220 localized hits. Genomic I-Ppol sites 
were searched for by using NCBI BEASTN 2.2.26+ (32) 
on the reference assembly GRCh37.p5. Variable-sized 
windows around each IS from lN-l-PpolNii9A-containing 
vectors were tested to detect the abundance of I-Ppol sites 
in the region. Owing to sparse location of the sites, the 
abundance was saturated around a megabase window. 
New sequences were stored in the NCBI GenBank 
sequence database (accession numbers JS886887- 
JS920506). 

Statistics 

The cytotoxicity data were analysed using one-way 
ANOVA with Dunnett's multiple comparison post test. 
For the rDNA-targeted IS study, aU vector sets were 
compared with each other. Statistical analysis was per- 
formed using the Fisher's exact test when comparing 
data sets with n < 500 with each other, and with the 
Chi-square test when the larger data sets were part of 
the comparison. Analyses were performed using 
GraphPad Prism version 5.01 for Windows, GraphPad 
Software, San Diego, CA, USA, www.graphpad.com. 
Statistics used to analyse the genomic heatmap data 
(Figure 6) are described in Brady et al. (33). 

RESULTS 

Generation and characterization of LVVs containing 
the IN-I-Ppol fusion protein 

Before IN-I-Ppol was packaged into LVVs, the ability of 
the fusion protein to specifically cleave the I-Ppol recog- 
nition sequence was verified in a plasmid cleavage assay. 
The recombinant IN-I-PpoI fusion protein proved to be as 
efficient a restriction enzyme as the commercial 1-PpoI 
enzyme used as a positive control (Supplementary 
Figure SI). The third-generation EVV packaging 
plasmids were then modified to contain the 1-PpoI 
reading frame 3' to IN and used in vector production 
(FigurelA). In addition to wild type I-PpoI, an inactivated 
version of the meganuclease was fused to IN. This IN-I- 
PpolNii9A protein was designed to allow the study of in- 
tegration site selection in the absence of target DNA 
cleavage, which is unnecessary for IN-catalysed integra- 
tion reactions. The remaining three plasmids used in EVV 
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production were unmodified: the transfer construct, which 
forms the vector RNA genome and encodes for a GFP 
marker protein under a PGK promoter; the VSV-G 
plasmid, which encodes a heterologous envelope protein 
to pseudotype the vector; and a plasmid encoding for 
REV, which is needed both for the expression of the gag 
and pol genes and the accumulation of packageable vector 
transcripts (34). Fusions to IN's C terminus may be det- 
rimental for integration catalysis, but IN molecules 
mutated at different domains are capable of complement- 
ing each other's functions to restore integration (35). 
Mixed multimer vectors containing the unmodified wild 
type IN (wt IN) or the integration-defective INd64v in 
addition to IN-I-PpolNngA were therefore also generated. 
The sizes of the IN-fusion protein bands were as expected, 
with minimal unspecific protein degradation (Figure IB). 
Congruent with previous studies (13,36), we found mixed 
niultimers of the IN molecules obhgatory for vector inte- 
gration (Figure 2). The vectors containing wt IN in 
addition to IN-I-PpoInii9a promoted long-term trans- 
gene expression better than vectors complemented with 

INd64v. 



Nucleolar rDNA is cleaved after IN-I-Ppol 
protein transduction 

To test the ability of the newly created lentivirus vectors to 
gain access to and carry out specific DNA cleavage on the 
nucleolar rRNA genes, we transduced MRC-5 lung fibro- 
blasts with vectors and virus-hke particles (VLPs) contain- 
ing the IN-fusion proteins. Sites of DSB formation were 
visualized through yH2A.X immunocytochemistry (37). 
Confocal microscopy analyses revealed parallel and 
overlapping localization of the DSB marker with the nu- 
cleolus marker fibrillarin, indicating rRNA gene cleavage 
by the IN-I-PpoI-containing vectors and VLPs (Figure3 
and Supplementary Table SI). In addition, distinctive 



ring-like nucleoli were detected in these cells, suggestive 
of fibrillarin re-organization in response to nucleolus- 
directed DNA damage (Supplementary Table SI, 
Supplementary Figure S2). As expected, IN-I-PpoInii9a 
caused fewer and less visible DSBs. In addition to the 
weU-known I-Ppol cleavage sites in the 28S rRNA gene, 
eight perfect full-length (Supplementary Table S2) and 
several degenerate (38) I-PpoI sites can be found in the 
human genome. Three of the perfect I-PpoI sites mapped 
by us in the reference assembly GRCh37.p5 (Supplemen- 
tary Table S2) have been previously described (39,38), and 
six of them are found in LSU (large subunit) rRNA repeat 
elements. The non-nucleolar sites can be cleaved in up to 
25% of cells at 6 hours after I-PpoI enzyme induction (38). 
In line with this result, we also detected yH2A.X signals 
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Figure 2. The relative integration efficiency of different IN-modified 
vectors. HeLa cells transduced with LVVs containing different IN mol- 
ecules were assayed by flow cytometry. For each vector the amount of 
GFP-positive cells in different time points was normalized to the value 
of day two, when GFP expression generally reaches its highest value 
(set to 100%). The integration efficiency of different vectors can be 
evaluated by looking at values after day 10, by which expression 
from unintegrated vector genomes has dropped close to zero. 
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outside the nucleoli in cells transduced with IN-I-PpoI- 
containing LVVs or VLPs after 4-6 hours of treatment 
(Figure 3, Supplementary Figure S2 and Supplementary 
Table SI). 

The emergence of concurrent DSBs at many genomic 
loci can be detrimental to a cell's survival. Constitutive 
I-Ppol expression results in the cleavage of about 10% 
of the rDNA 1-PpoI sites, and is known to be cytotoxic 
in human cells (21). Although lentivirus particles have 
been estimated to contain only 15-250 molecules of IN 
(40,41), we noticed a distinct morphology in LVV or 
VLP lN-1-PpoI transduced cells, indicative of I-Ppol's 
cytotoxicity. Indeed, the viabihty of cells transduced 
with these vectors was found to decrease after transduc- 
tion (Figure 4). In conclusion, the IN-fusion protein 
strategy is an efficient means to package site-specific nu- 
cleases into LVV or VLP particles, which can deliver their 
protein cargo into transduced cell nuclei to obtain genome 
cleavage. In the case of LPpoL which has several cleavage 
sites both in the nucleolar rRNA genes and in other 
genomic locations, the extent of DNA cleavage was 
cytotoxic. 

Directed integration into rDNA by IN-I-PpolNii9A 

To study whether the IN-fusion protein disabled for DNA 
cleavage would have an impact on the vector integration 
site selection, MRC-5 cells were transduced with LVVs wt 
1N/1Nd64v+IN-LPpo1nii9a and cellular ISs were ex- 
tracted using LM-PCR and 454 sequencing. ISs were 
mapped to rDNA by counting BLAT hits in the 
unplaced supercontig ChrUn_gl000220 that contains one 
full and one slightly shortened rRNA gene repeat. To de- 
termine the level of background rDNA integration by an 
unmodified HIV-1 lentivirus vector, the abundance of ISs 
in rDNA was studied using two pubHshed vector data sets 
(Figure 5). No integrants were found in rDNA for the 
smaller data set that was generated with the same restric- 
tion enzymes as the data described here. For the larger 
data set, 0.1% of the vector's 40604 cellular ISs were 
localized to the rDNA. In contrast, an rDNA-targeting 
efficiency of 2.7% was found for the vectors containing 
1Nd64v+IN-I-PpoInii9a, the difference to the control 
vectors being significant. The second IN-modified vector 
tested, wt IN+ IN-I-PpoInii9a, yielded an rDNA-target- 
ing efficiency of 0.2%. The reason why it failed to target 
integration into rDNA above the background levels may 
be due to an uneven distribution of the fusion protein in 
newly formed vector particles; if only wt IN containing 
LVVs are generated along with wt IN+ IN-I-PpoInii9a 
particles, these would integrate more efficiently and 
randomly, affecting the result. The intact catalytic core 
domain of the wt IN molecule may also compete in 
DNA binding with I-PpoInii9a- Integration targeting 
towards the I-PpoI recognition sites residing in the 
non-nucleolar chromosomes (Supplementary Table S2) 
was not observed for either of the IN-I-PpolNii9A-con- 
taining vectors. The majority of the IN-modified vectors 
ISs in rDNA localized to 18S and 28S rRNA genes 
(Supplementary Figure S3). For the control vector. 
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Figure 4. Cytotoxicity of IN-I-PpoI containing LVVs. HeLa (A) and 
MRC-5 (B) cells were transduced with two vector concentrations (2 and 
lOng of p24 per well) of LVVs containing different IN molecules (left). 
Cellular viability was measured 24, 48 and 72h after transduction 
(day 1, 2 and 3). Viability of the untreated cells at each time point 
is set to 100%. Viability of the vector-treated cells in a given time 
point is shown as the percentage of the untreated cells' values. 
Differences between vector-treated groups and the untreated cell 
values were analysed at each time point using one-way ANOVA 
and the Dunnett's Multiple Comparison Test. ***/"< 0.001, 
**0.001 <P<0.01, *0.01 <P<0.05. 
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Figure 5. Integration frequency in the rDNA. Integration frequency 
in the rDNA is shown. Previously published data sets of HIV-1 
integration sites are shown as a reference. All vector sets were 
compared with each other. Differences between vectors were statistic- 
ally significant only between INd64v+IN-I-PpoIni i9a and the rest of 
the vectors; the P-values for these comparisons are shown. IS, inte- 
gration site. "HIV-1 vector integration sites (53) generated with the 
same restriction enzymes as the IN-fusion protein data sets. ''HIV-1 
vector integration sites generated with the restriction enzymes Avrll 
and Msel (43). 



Page 7 of 10 



Nucleic Acids Research, 2013, Vol. 41, No. 5 e61 



many ISs also localized to parts of the contig in which no 
rRNA gene-related annotations were found. 

Taken together, IN-I-PpoInii9a increased transgene in- 
tegration into rDNA when complemented with INdi34v. 
Because significantly less IS were found in rDNA for the 
control vectors, this shift is addressable to the DNA rec- 
ognition properties of 1-PpoInii9a. 

IN-I-PpolNii9A changes the typical integration 
pattern of LVVs 

Dimerization and folding of the IN-attached I-Ppol may 
sterically inhibit the interaction of IN with its important 
cellular cofactor LEDGF/p75. Differences seen in the 
vectors' integration frequency with regard to specific 
genomic features argues in favour of this theory (Table 1 
and Figure 6). HIV-1 normally prefers AT-rich sequences 
close to integration sites, hkely resulting from the 



DNA-binding specificity of LEDGF/p75 through its 
AT-hook (42), and disfavours integration in CpG islands 
(43). In contrast to the control vector, both of the 
IN-modified vectors showed favoured integration into 
CpG islands and GC-rich DNA close to the integration 
site (Figure 6). This shift is also seen in LEDGF/p75- 
depleted cells, where the lentiviral integration pattern 
starts to resemble that of simple retroviruses (44). The 
similarity of the change in integration pattern imphes 
that I-Ppol blocks the interaction of IN with LEDGF/ 
p75 or alternatively competes in DNA binding with 
LEDGF/p75, tethering IN to more GC-rich DNA. Parts 
of the rRNA genes have a high GC content and CpG 
island frequency (Supplementary Figure S3), but this is 
unhkely to explain the difference because the majority of 
vector IS reside in non-nucleolar DNA. With respect to 
integration within oncogenes, no differences were found 
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Figure 6. Integration frequency in different genomic features. A heat map summarizes the relationships of vector integration site data sets (indicated 
above the columns) to selected genomic features (left of the corresponding row of the heat map). Tile colour indicates whether integration by 
different vectors is favored (increasing shades of red) or disfavored (increasing shades of blue) in a given feature relative to their matched random 
controls, as detailed in the colored receiver operating characteristic area scale at the bottom of the panel. The p-values shown as asterisks (*p < 0.05, 
**p<O.W, ***/?< 0.001) emerge from significant departures from the wt IN data set (53). The base pair values in the row labels indicate the size of 
the genomic interval used for analysis. Statistical methods and detailed naming of the genomic features: Berry et al. (30) and Brady et al. (33,54). 
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Table 1. Summary of integration frequency in genomic features 
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Integration sets and their genomic distributions are shown. Significant deviation from matched random controls is shown. Statistical analysis between 
vectors and their matched random sets: Fisher's exact test for in/out or <XX-bp based annotations and the Wilcoxon rank sum test used for Counts/ 
GC-based annotations. 
"Frequency: ISs/matched random controls. 

''The allOnco list: http://microb230.med.upenn.edu/protocols/cancergenes.html. 
*P<0.05; **/'<0.01; ***/'<0.001. 



between the modified vectors and the control vector 
(Table 1). 



DISCUSSION 

Targeted integration of transgenes to predetermined 
genomic sites presents one of the most important goals 
in current vector development. The ability of DNA 
repair proteins to incorporate exogenous DNA with 
homology arms to nuclease-catalysed DSBs has been har- 
nessed in the majority of recent methods. Although good 
results can be obtained by transfecting the nuclease- 
encoding and donor DNA-carrying plasmids into cells, a 
broader applicability of the DSB-HDR mechanism 
requires better vectorization of its components. To this 
end, these sequences have been transferred into IDLVs 
(6,45-49), which promote transient expression in 
dividing cells. Such a setting can lead to site-specific trans- 
gene integration at high-efficiency in vitro, although in 
many cell types it has remained below 5% (45^9). 
Cellular expression of the nuclease from an IDLV also 
holds potential for unwanted genotoxicity through 
off-target activity, or inadvertent integration of the ex- 
pression construct. In addition, IDLV delivery of ZFNs 
and the donor molecule generally rely on generating and 
using three different vectors, which is impractical in terms 
of maximized transduction efficiency and is difficult to 
apply for in vivo use. 

The IN-fusion protein, or c/.^-packaging strategy, 
described here is a method by which both a desired 
protein and a transgene construct can be simultaneously 
delivered into transduced cells within one vector particle. 
Consequently the targeting protein does not need to be 
expressed in transduced cells, but is delivered at fixed 
amounts. In contrast to the HIV-1 Vpr-based trcins- 
packaging method (7), the IN-fusion protein approach 
does not require increasing the number of plasmids trans- 
fected into vector-producing cells to obtain foreign protein 
incorporation. This may enhance the levels of vector pro- 
duction and avoids optimization of new transfection 
schemes. We generated vectors that contained both an 



IN-fusion protein with DNA-cleaving activity and a 
fusion protein where only the DNA-binding activity of 
I-Ppol was retained. With such vectors, we were able to 
demonstrate that the cw-packaging method is applicable 
for both the nuclear dehvery of a meganuclease and 
altering the integration pattern of LVVs with increased 
transgene integration in the rDNA. 

As a target for transgene integration, rDNA seems hke 
an interesting GSH candidate owing to the many unique 
features it bears in comparison with non-nucleolar DNA. 
First, rRNA genes are isolated on five short chromosome 
arms where they reside far away from protein-coding 
genes with oncogenic potential. Second, the numerous 
copies of rRNA genes can compensate for the loss of 
one gene due to transgene integration. Third, the spacer 
regions between rRNA gene repeats may limit the tran- 
scriptional status of the transgene from spreading to the 
surrounding chromatin, and vice versa. rDNA clusters are 
subject to meiotic rearrangements at a high frequency, 
which leads to considerable variation in rDNA cluster 
size between healthy individuals (22). Under mitosis, 
however, the gene cluster architecture is ordinarily well 
preserved (50). It is therefore not likely that transgenes 
inserted in the rRNA genes of somatic cells will become 
eliminated, translocated or multiplied due to rDNA 
cluster recombination. Assessing the long-term stability 
of rDNA-inserted transgenes is nonetheless important to 
fuUy characterize the locus' suitabihty as a GSH. 

With vectors containing the IN-I-PpoInii9a fusion 
protein, we found 2.7% of the cellular integration sites 
to localize to rDNA, which is a significant increase to 
the 0.1% analysed for unmodified lentivirus vectors. 
Previously the rDNA of human cells has been targeted 
for transgene integration using electroporated homolo- 
gous donor molecules (51). The actual frequency of HR 
in this system (1 x 10~^) is, however, typicaUy considered 
too low for practical applicabihty. When compared to a 
similar strategy that used an IN-fusion protein to target 
the cellular E2C site, our targeting efficiencies are 2-3 fold 
higher (15). However, differences in analyzed integration 
site numbers impede full comparability of the results. 
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Nucleases are efficient tools to acquire targeted genome 
modifications, but integration of a single vector copy in 
the correct locus may represent only a fraction of all 
possible outcomes after DSB induction. Even donor mol- 
ecules lacking homology can be incorporated into or close 
to the cleaved site with surprising efficiency through 
non-homologous end joining (NHEJ) (47). With 
homology-containing donor molecules, 10% of the 
targeted integration reactions were found to result from 
the combined action of HDR with NHEJ and 7% of the 
analysed donors had integrated randomly. Concatameric 
donor molecule insertions are also frequently seen in ex- 
periments targeting transgene integration into a cellular 
DSB (45,46,6). This highhghts the need for in-depth 
analysis of all potential recombination events in the 
nuclease-treated cells to avoid problems that could arise 
from random integration, incorporation of unintended 
vector sequences through NHEJ or disruption of an 
already corrected sequence. IN-catalysed lentivirus 
vector integration is not known to associate with 
concatameric insertions, enzyme-dependent cytotoxicity 
or genomic rearrangements. However, because lentiviruses 
tend to integrate within expressed genes, their applicability 
for therapeutic gene integration would be improved by 
efficiently targeting integration to safer genomic areas. 

In conclusion, the data presented here show that 
IN-fusion proteins can be used as an alternative to Vpr 
to package DNA-cleaving proteins into lentivirus vectors, 
but also to increase IN-catalysed transgene integration at 
a pre-determined genomic locus. An IN-fusion protein 
with fewer genomic cleavage sites than found for I-Ppol 
could shed hght on the question of whether this approach 
could also be used to enhance site-specific gene insertion 
through HDR and NHEJ. Also, the efficiency of 
IN-catalysed integration targeting may be increased with 
alternative DNA-binding proteins targeting the rDNA, or 
other genomic sites proposed as GSHs (52). 
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